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Detecting communities in a network, based only on the adjacency matrix, is a problem of interest 
to several scientific disciplines. Recently, Zhang and Moore have introduced an algorithm in [P. 
Zhang and C. Moore, Proceedings of the National Academy of Sciences 111, 18144 (2014)], called 
mod-bp , that avoids overfitting the data by optimizing a weighted average of modularity (a popular 
goodness-of-fit measure in community detection) and entropy (i.e. number of configurations with a 
given modularity). The adjustment of the relative weight, the “temperature” of the model, is crucial 
for getting a correct result from mod-bp . In this work we study the many phase transitions that 
mod-bp may undergo by changing the two parameters of the algorithm: the temperature T and the 
maximum number of groups q. We introduce a new set of order parameters that allow to determine 
the actual number of groups q, and we observe on both synthetic and real networks the existence of 
phases with any q £ {l,q}, which were unknown before. We discuss how to interpret the results of 
mod-bp and how to make the optimal choice for the problem of detecting significant communities. 

PACS numbers: 64.60.aq,89.20.-a 


I. INTRODUCTION 

In community detection, the goal is to regroup nodes 
of an observed network into different groups (or commu¬ 
nities) of nodes believed to be similar, and thus to find 
a meaningful partition of the network. The assumption 
that this is possible comes from the hypothesis that the 
structure of the graph reflects hidden attributes of the 
nodes, that can therefore be inferred. Though recent 
studies show that such an assumption does not hold in 
general for real networks ^ , generative models with this 
property, such as the stochastic block model [2] (SBM), 
have proved the efficiency of community detection al¬ 
gorithms [3]. Different classes of community detection 
algorithm exist: among the popular approaches, algo¬ 
rithms relying on Bayesian inference fit the parameters 
of an assumed generative model to the observed net¬ 
work Em, while spectral algorithms find communities 
from the eigenvectors of a matrix based on the adjacency 
matrix of the network ElEl- 

The hypothesis most commonly made is that of assor- 
tative networks, which means that nodes with the same 
hidden attributes are more likely to be linked than nodes 
with different attributes. Under this hypothesis of assor- 
tative networks, a popular measure of the goodness of a 
partition is the modularity, and therefore various commu¬ 
nity detection algorithms rely on modularity maximiza¬ 
tion EHIO]. Recently the authors of [TT] (called ZM here¬ 
after) introduced such an algorithm that avoids the com¬ 
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mon pitfall of overfitting: indeed maximizing modularity 
predicts communities even in unstructured (i.e. random) 
networks. The only free parameters in the mod-bp algo¬ 
rithm are the number of groups q and a temperature-like 
parameter T. Three ranges of temperatures are identi¬ 
fied that correspond to phases in which the algorithm has 
qualitatively different behaviours: at high temperatures 
no division in groups is found, at intermediate temper¬ 
atures meaningful groups are found and for low enough 
temperatures the algorithm does not converge. In this 
paper, we broaden the picture given in ZM by showing 
that there are in general more than three phases. We 
show that despite passing to the mod-bp algorithm the 
number q of groups, it can spontaneously return a parti¬ 
tion with a smaller number of groups q < q. We introduce 
a new set of order parameters that allows to determine 
q, and observe both on synthetic and real networks the 
existence of phases with different values of g £ {l,q}. 

We will use the following notations: N is the number 
of nodes in the network, £ is the set of m undirected 
edges, and we write {ij) £ f if an edge is present between 
nodes i and j. The degree di of a node is the number of 
edges that link node i to other nodes. A partition of the 
network is a set {t}, where ti £ {l,g} is the group node 
i belongs to. q is the maximum number of groups. 

The modularity of a partition {t} is defined by [12] 



where S is the Kronecker delta function. High values of 
modularity indicate that there are more edges between 
nodes of the same group than between nodes of differ- 
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ent groups: thus, the higher the modularity, the better 
the partition. The advantage of modularity is that it 
makes no assumption on the way the network was gener¬ 
ated, but only that it has an assortative structure. This 
encourages its use on real networks, in which the true 
generative process is generally unknown, and has lead to 
several algorithms performing community detection by 
maximization of modularity IMO]. 

One drawback of modularity is that finding the par¬ 
tition with highest modularity is a discrete combinato¬ 
rial optimization problem |13j . which becomes rapidly 
intractable as N increases; so effective heuristics have 
to be developed. Another drawback is that modularity 
maximization is prone to overfitting: it is possible to find 
high-modularity partitions even in Erdds-Renyi random 
graphs m although by construction they do not con¬ 
tain an underlying group strucure |15H17j . Finally, there 
exists a fundamental resolution limit [18) . that prevents 
from recovering small-sized groups. 

ZM introduces a new community-detection algorithm 
based on modularity maximization, tackling the two first 
mentioned drawbacks, and proposing a multi-resolution 
strategy to overcome the third. The algorithm, called 
mod-bp , is scalable, i.e. is of polynomial complexity 
with respect to N, and the authors show that it does 
not overfit, in the sense that it does not return high- 
modularity partitions for Erdds-Renyi networks. 

This is achieved by treating modularity maximization 
as a statistical physics problem with an energy 


E{{t}) = -mQ{{t}) (2) 


at a finite temperature T = In this way, every par¬ 
tition {t} is given a probability taken from the Gibbs 
distribution 




( 3 ) 


where Z is the partition function 


Z = (4) 

{*} 


To solve the problem of sampling from the Gibbs distri¬ 
bution p), ZM proposes a belief propagation (BP) algo¬ 
rithm p^|5D], in which so-called messages sent 

between all pairs of nodes {ik), for q different groups t. 
We refer the reader to ZM for a precise description of 
the algorithm. After convergence of the BP algorithm, 
marginals ij^l are obtained from the messages, ijjt repre¬ 
sents the probability that node i belongs to group t, and 
the most-likely group for node i is therefore: 

ti = argmax'f/jJ. (5) 

t 


Using this maximization, the maximum a posteriori mod¬ 
ularity corresponding to the assignment {i} can be 

calculated as 




As the algorithm samples from the distribution (|^, 
one can also define an average modularity that 

is calculated from the marginals instead of the most-likely 
partition, and which is proportional to the average energy 
of the model 
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^ log Zi- log Zi, 
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where Zi and Zij are the normalizations of the marginals 
and the two-point correlation functions respectively, and 

While the problem of maximizing modularity is equiv¬ 
alent to finding the ground state of ([^, sampling from 
([^ at a finite temperature corresponds to minimizing 
the corresponding free energy. This means taking into 
account not only the modularity, but also the entropy, 
counting the number of partitions with a given modular¬ 
ity. In this way, instead on focussing on a single partition, 
mod-bp at finite T returns a partition that is a good con¬ 
sensus of the many existing high-modularity partitions, 
as advocated in pi) . 


II. PHASE TRANSITIONS 

As in numerous statistical physics problems, ([^ may 
lead to phase transitions at some given temperatures. Us¬ 
ing modularity as an energy function is similar to study 
a Potts-like statistical mechanics problem P2] , for which 
Ref. P3] has shown that a phase transition is always 
present. ZM reports that temperature ranges define three 
different regimes of the algorithm. At very low temper¬ 
atures, the system is in a spin glass phase, in which the 
algorithm does not converge to a fixed point. At high 
temperature, the system is in a paramagnetic phase in 
which the fixed point is trivial and all nodes have an equal 
probability 1/q of belonging to any of the groups. In net¬ 
works with statistically significant communities, there is 
an intermediate temperature range called recovery phase, 
in which the algorithm converges to a non-trivial fixed 
point, from which group assignments can be obtained 
using 1^. 

Here, we broaden this picture by showing that the re¬ 
covery phase itself can be divided in up to g — 1 phases, 
with 2 < q < q. At a temperature separating two phases, 
there is an order parameter that becomes vanishingly 
small, increasing T, and the number of iterations needed 
by the algorithm to reach the fixed point diverges. 


A. Model-based critical temperatures 

Modularity as a measure of goodness of a partition 
is particularity appealing for real networks, because it 
makes no assumption about an underlying model that 
generates the network. Though appealing, this absence 
of model is problematic when it comes to determining the 
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best temperature at which to run mod-bp (i.e. there is 
not Bayes optimal temperature). In ZM two generative 
models are analyzed, allowing to find two useful charac¬ 
teristic temperatures: 

1. In the configuration model, a network is built by 
randomly creating links between nodes of known 
degree, until all nodes have the right number of 
neighbours. ZM shows that in this model, the phase 
transition between the spin-glass and the paramag¬ 
netic phase takes place at 



where c is the average excess degree, calculated 
from the average degree (d) and the average 
squared degree (d^), and given by 


c = 



( 8 ) 


2. In the stochastic block model (SBM) [2], the nodes 
are grouped into q* equal-sized groups, and for each 
pair of nodes (ij), a link is created with probability 
Prs if i belongs to group r and and j belongs to 
group s. In the most simple case, we take Prs = Pout 
if r 7 ^ s and Prs = Pin if r = s. One often condiders 
networks with sparse connectivity, i.e. the average 
number of links between a node i from group r and 
all the nodes from group s, Crs, does not grow with 
the size of the network. ZM shows that mod-bp 
is as successful as a Bayes-optimal algorithm [53] 
and that the transition between the paramagnetic 
phase and the recovery phase takes place at 


TR{e) 



q{l + {q-m 
c(I - e) - (I -f (g- l)e) 



, ( 9 ) 


where e = Pout/Pin- 

For real retworks, the stochastic block model is usually 
a bad model, and the recommendation of ZM is to run 
the algorithm at T*, which seems to always lie inside of 
the recovery phase. We can also note that the e —>■ 0 
limit of (|^, 


FIG. 1. (Color online) di 2 , d 23 and das as a function of tem¬ 
perature. In order to follow the groups at different temper¬ 
atures, the temperature is increased step by step, and the 
messages are initialized with the final values they reached at 
the last temperature. We see that the group distances dki are 
like order parameters undergoing a phase transition at differ¬ 
ent temperatures, where they drop by more than ten orders of 
magnitude. Due to this phase transition, it is easy to choose 
a threshold dmin in Eq. ( |12[ ). The dataset is “political books”, 
run with q = Q. 


minor fluctations due to the numerical precision of the 
machine or incomplete convergence of the algorithm. Due 
to those fluctuations, calculating a retrieval configuration 
with ([^ is in general still possible, and would lead to a 
very small but non-vanishing retrieval modularity . 

However, the meaning of the paramagnetic phase is 
that all groups are strictly equivalent or degenerate, and 
therefore should be exactly zero. In order to 

obtain this, the algorithm has to check for degenerate 
groups before assigning a group to each node, and assign 
the same “effective” group to nodes for which the max¬ 
imization ([^ leads to different, but actually degenerate 
groups. 

To check if groups are degenerate, we can look at the 
following distance between two groups k and 1: 

= (II) 

2 = 1 


is a useful upper bound for T. Indeed, above this temper¬ 
ature, the algorithm converges to the paramagnetic solu¬ 
tion even for networks composed of disconnected compo¬ 
nents, and is therefore useless. 

B. Degenerate groups 

In the paramagnetic phase, the marginal of every 
node i of the network is equal to ^ for all t, up to some 


If dki is smaller than a choosen threshold dmin, we can 
consider that group k and group I are degenerate and 
that they should not be distinguished. 

An effective number of groups, g, can then be defined 
as the number of distinguishable groups. We can define 
a mapping </> between the g groups used by the algorithm 
and the g distinguishable groups: for each group k, (p{k) 
is an integer between 1 and g representing one of the 
effective groups, and 

V(fc, 0, (t>{k) = (j)(l) dki < d^in- (12) 

With this mapping, we replace the group assignment pro- 
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FIG. 2. Matrices of distances between groups for different temperatures. The dataset is “political books”, the algorithm is run 
with q = 6 for T = 0.26, 0.9,1.0,1.28, 2.1, 2.3 (from left to right). We observe the formation of a growing cluster of groups that 
are equivalent, allowing us to define a number of effective groups q, that varies from 6 at low temperature (left) to 1 in the 
paramagnetic phase (right). Note that the area of the squares is not related to the number of nodes contained in the groups. 


cedure §by 


ti = 


(j) ( arg max 


(13) 


With this assignment procedure, is strictly zero in 

the paramagnetic phase, because all nodes belong to the 
same group. 

Fig. [3 shows that choosing a threshold dmin is mean¬ 
ingful because dki undergoes a phase transition at which 
it sharply drops of several orders of magnitude. 

Interestingly, group degeneracy is not only observed 
in the paramagnetic phase, but also inside the retrieval 
phase. In that case, not all groups are degenerate, but 
only a subset of them. Fig. shows this for the popular 
network “political books” 12^ ^. on which mod-bp was run 
at different temperatures. 


III. EXISTING DOMAINS OF PHASES 

Thanks to the group assignment procedure in Eq. (H), 
up to <7 -|- 1 phases can exist for any network on which 
mod-bp is run with q groups: one for each q G {I,?}, 
plus a spin glass phase. Fig.j^shows this for the network 
“political books”. On this network, several phases coex¬ 
ist at low temperature, whereas for higher temperatures, 
the phases exist in well separated temperature intervals. 
In the latter case, we can define a ‘critical’ temperature 
Tfc, separating the phase with q = k from the one with 
q = k + 1. As can be seen on Fig. |3 the number of it¬ 
erations needed for mod-bp to converge greatly increases 
around these critical temperatures. As noted before, Tq 
is a good reference temperature, and normalizing all tem¬ 
peratures by To is a good way to compare critical tem¬ 
peratures Tfc for the same network with different q values, 
and also for comparing different networks. 


A. Location of critical points Tk 

In some cases, a subset of n critical temperatures Tfc 
can be degenerate, in which case there is a phase tran¬ 
sition between a phase with q = k and a phase with 
q = k + n. For instance, this is the case in networks 


qMARG _ qMAP 



#iter ■ # groups * 



FIG. 3. (Color online) Modularities and numbers of effective 
groups q obtained by sweeping a temperature range from 0 
to 1.2 To on the dataset “political books” with q — 6. The 
vertical lines indicate the positions of T* (left) and To (right). 
Above T*, the changes in q define quite homogeneous phases, 
separated by sharp transitions, where the number of itera¬ 
tions necessary to reach convergence increases greatly. Below 
T/To ~ 0.4, the phase is not homogeneous: depending on the 
starting conditions, q can be 4, 5 or 6. Note that in¬ 

creases only minimally when q exceeds 3, which agrees with 
the fact that q* = 3. 
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generated by the stochastic block model with the same 
in-connectivity pin inside each of the q* groups (Fig. 
above). This agrees with the description of the three 
phases given in ZM. 

In contrast, in networks generated by the SBM with 
Prr ^ Pit the degeneracy is lifted (Fig. 13 below). 

The figure also shows that, starting above Tq (Te. in the 
paramagnetic phase) and lowering the temperature, the 
groups are inferred in order of their strength. 

To show this, we use the recall score for different 
groups, which allows to see if one of the inferred groups 
corresponds well to a given real group. To quantify the 
similarity between a real group G and an inferred group 
Gi, that are not necessarily of the same size, we can use 
the Jaccard score [1], which is defined by: 


J{G,Gi) 


\GnG,\ 

|GuG,r 


(14) 


The recall score is the maximum of the Jaccard score: 


i?(G) =maxJ(G,Gi). (15) 

i 

A recall score close to 1 means that one of the inferred 
groups Gi is almost identical to group G. Fig. (below) 
therefore shows that around T/Tq = 0.7, the group with 
the biggest in-connectivity is nearly exactly recovered by 
one of the groups returned by the algorithm, whereas the 
two groups with lower in-connectivity are not. Only by 
further lowering temperature, when q — 3, all the groups 
are correctly inferred. 


B. Running mod-bp with q ^ q* 

On networks generated with the SBM, the real number 
of groups q* is known, and it is thus interesting to look 
at what happens when mod-bp is run with q ^ q*. The 
behaviour for q = q* is described in ZM and in Fig. 
li q < q*, mod-bp cannot return the right number of 
groups, and will merge some of the real groups together 
so as to obtain q groups. The more insteresting case is 
when q is bigger than q*. 

First of all, the range of temperatures of the spin glass 
phase grows as q increases. If e is only slightly above the 
detectability threshold e* [HI |H] , increasing q can lead 
to a situation where there is no recovery phase, between 
the paramagnetic phase and the spin-glass phase. 

However, we will focus on the case when e is small 
enough for intermediate phases to be present. As de¬ 
scribed previously, the phase transitions are degenerate 
if Pin is the same for all groups. Therefore, we generally 
observe only one intermediate phase, with q — q*. How¬ 
ever, this is not always the case and mod-bp can return 
partitions with different q values, depending on the intial- 
ization, similarily to what is observed on the real network 
in Fig. Two phenomena can be observed, separately 
or simultaneously. 


# groups Group 2 • 

Group 1 -» Groups 



T/To 


# groups * Groups • 

Group 1 -» Groups 



FIG. 4. (Color online) Degeneracy of T^s on networks gen¬ 
erated by the SBM with N — 5000, q* — 3 and Cout = 2. 
mod-bp was run with 5 = 3. Top: group 1 has higher in¬ 
connectivity than the two others: cn = 30 whereas C22 = 
C33 = 15. Ti and T 2 are distinct, and from the recall scores we 
see that only group 1 is detected between Ti and T 2 , whereas 
groups 2 and 3 have an equally low recall score, as in the 
partition given by the algorithm, they are merged to a single 
group. Below T 2 , q = 3 and the algorithm separates groups 2 
and 3. Bottom: all 3 groups have the same in-connectivity 
Cin = 30. There is no 5 = 2 phase because Ti and T 2 are 
degenerate. For both experiments, the spin-glass phase is not 
reached. 


The first phenomenon is the one with q = q* + 1, where 
q* of the groups correspond very well to the real groups, 
and the last group contains only a very small fraction of 
nodes. Depending on the initialization, this last group 
can even contain no node at all, in which case it can 
be simply discarded. This phenomenon is likely to come 
from the stochasticity of the SBM and is observed also for 
large networks with 10® nodes. The modularity of those 
partitions is equal to, or slightly higher than those found 
in the q = q* phase of mod-bp run with q = q*, which 
explains why they are found. On the other hand, we 
have never observed more than one of these additional, 
and almost empty, groups, such that q is always at most 
equal to g* -|- 1. 
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Inferred number of groups q: 
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FIG. 5. (Color online) These plots show the inferred number of groups q as a function of the normalized temperature T/Tq 
and of q, for the “political books” (left) and “political blogs” (right) networks. The dotted lines mark T — T* (left) and 
T = To (right). The “n.c” areas correspond to instances that did not reach the convergence criterion (10~®) in 700 and 
300 iterations, respectively for the two networks. To take into account coexisting phases, the algorithm was run for 200 
(respectively 50) different initializations at each temperature. The position of Ti is very stable across the different values of q, 
and is characterized by a diverging number of iterations. The other critical temperatures are not always well defined due 
to overlaps between phases, and to phase transitions becoming much less sharp; however, up to q = 4, the phases stay well 
separated, with a clear divergence of the number of iterations. Remarkably, the existence domains of each phase in terms of 
T/To does not vary a lot with q. 


The other phenomenon is that of distinct groups merg¬ 
ing together in the retrieval partition, leading to q < q*. 
Such partitions have lower modularities than partitions 
with q = q* (found at same temperature from a different 
initialization), showing that the algorithm is unable to 
correctly maximize the modularity starting from any ini¬ 
tialization. This is likely due to the existence of “hard but 
detectable” phases [23], in which frozen variables cause 
algorithms to be stuck in suboptimal solutions. A sim¬ 
ple way out from this problem is to run the algorithm 
several times with different initial conditions, selecting 
finally the configuration of higher modularity found. 

These two effects might coexist, and produce retrieval 
partitions in which two of the real groups are merged 
into a single one, and an additional group containing very 
few or even no nodes at all is also present. In this case 
q = q*, but the retrieval partition is not the right one. 
So the existence of of an almost empty group should be 
considered as a warning on the reliability of the mod-bp 
result. 


C. Results on real networks 

For community detection on real networks, q* is in 
general unknown and there is no available ground truth. 
From Fig.j^and the previous section, we know that mod- 
bp can converge to partitions with different q at the same 
temperature, depending on the initialization. This moti¬ 
vates us to run mod-bp several times for each tempera¬ 
ture, which allows us to quantify the probability a given 
q is found at any given temperature T. Fig. shows the 


coexistence of phases in the “political books” [2S] and 
“political blogs” m datasets for different values of q. 
The analysis made in these figures is similar to the one 
proposed in |28j for multiresolution community detection. 

These figures suggest that, at a given normalized tem¬ 
perature T/Tq, the results returned by mod-bp only 
marginally depend on the chosen q as long as q > 
q*. Moreover we observe that, within a phase with a 
given number q of groups found, the partition {t} only 
marginally depends on the temperature T. Averaging 
over the several partitions found at different tempera¬ 
tures and with different initial condition, we show in 
Fig. § and that depends essentially on q, and 

only minimally on q. As in ZM, we consider that the 
largest q leading to a significant increase of w.r.t. 

q — 1 is a plausible estimate of q *, which agrees well with 
the commonly accepted “ground truths” of q* = 3 for 
“political books” and q* = 2 for “political blogs”. 

In Fig. [^we also show the distribution of overlaps be¬ 
tween randomly chosen partitions with the same q. The 
overlap between two partitions {t} and {s} is a number 
between zero and one and is defined as 

0({t}, {s}) = max ^ > (16) 

where the maximum over all permutations ct of {1,..., q} 
allows to lift the permutation symmetry of the group 
numbering choice. The closer the overlap between two 
partitions is to one, the more similar they are. Fig. 
thus shows that partitions with the same q are very sim¬ 
ilar to one another, independently of the two parameters 
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FIG. 6. (Color online) as a function of q and q, using 

the same experimental results as in Fig.[^ Symbols represent 
the mean of all experiments with a given q resulting in 

a given q, along with an error bar representing the standard 
deviation. Despite the use of different temperatures, the error 
standard deviations are very small for each q. Furthermore, 
the mean for different q are very similar, such that 

we can consider to essentially depend on q, with only 

negligeable influence of q and T. The fact that the increase 
in for q > 3 is minimal concords with the fact that 

g* = 3. 


of mod-bp , T and q, for which they were obtained. One 
may be worried about the double peak structure of the 
g = 3 case and wondering whether the two peaks do ac¬ 
tually corresponds to different communities structures. 
We have looked at the groups partitions returned by the 
algorithm and found the following. There is always a well 
conserved group of 520 to 530 nodes, while the remain¬ 
ing roughly 700 nodes can be clustered in different ways: 
for g = 3, there are 2 different partitions with roughly 
600-1-100 and 500-1-200 nodes; for g = 4, the partition 
is roughly 380-1-280-1-40 nodes. All these configurations 
have essentially the same modularity. So the conclusion 
is that the g = 2 partition (520-1-700 nodes) is significant, 
while further splitting of the cluster of 700 nodes is not 
very meaningful. 


IV. DISCUSSION 

Apart from the advantage of not requiring the knowl¬ 
edge of the generative model, a futher advantage of mod- 
bp is that it has only two adjustable parameters, T and 
g. However, for a given network, it is not clear how to 
choose them in order to obtain the optimal partition. 
The recommendation of ZM is to run mod-bp at T*(g), 
defined in Eq. 0. for increasing values of g, until it does 
not lead anymore to a significant increase in modularity. 
Based on the experiments on synthetic and real networks 
presented in this work, we conclude that an important 
additional step in this procedure is to calculate the ef¬ 
fective number of groups g of each partition returned by 
the algorithm, which can be different from g. Further¬ 
more, this phenomenon leads to a new rule for assigning 
a group to each node, given that some groups might be 
merged, which also affects the modularity. 
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FIG. 7. (Color online) as a function of q and g, 

using the same experimental results as in Fig. Symbols 
represent the mean of all experiments with a given g 

resulting in a given g, along with an error bar representing the 
standard deviation. The fact that does not increase for 

g > 2 concords with the fact that g* = 2. Bottom: Empirical 
distribution of 20000 overlaps between pairs of partitions with 
same g. 


Another possible way to proceed is to run mod-bp with 
a large value of g, and sweep the temperature scale from 
To(g) downwards. As T is lowered, the network is clus¬ 
tered into an increasing number of effective groups g, 
and the found partitions have increasing modularities. 
Again, the procedure can be stopped once the modular¬ 
ity does not increase anymore in a significant way as g is 
increased. 

For real networks, where the generating process is in 
general not known and not as straightforward as in the 
SBM, the number of groups to cluster the nodes is in 
part let as a choice to the user. In this case, running 
mod-bp with a quite large value of g and using T as the 
parameter to search for the optimal partition seem both 
desirable and efficient. To make the optimal choice, in 
addition to the value of the modularity of a partition 
with g groups, the range of temperatures where this g 
phase exists might indicate how relevant it is (as shown 
in Fig. [^. In particular, if a g phase only exists on a 
narrow range of temperatures, then it is likely to be less 
important, because less stable with respect to changes in 
the model parameter (T in the present case). 

Furthermore, as seen on graphs generated by the SBM, 
it may occur that some group contains a very small num¬ 
ber of nodes. In this case, merging them with bigger 
groups will only slightly change the modularity and re¬ 
sult into a more meaningful and stable partition. 

















V. CONCLUSION 


In this paper, we have studied the mod-bp algorithm 
proposed in Ref. m, focussing on the influence of the 
choice of the two adjustable parameters q and T, on 
both real and synthetic networks. We have given a more 
precise picture of the algorithm behaviour by identifying 
new order parameters that allow to define several differ¬ 
ent phases inside the recovery phase. In each of these 
phases, mod-bp clusters the nodes into a different num¬ 
ber of groups q. These phases can either be well sepa¬ 
rated on the temperature scale and be accompanied by a 
divergence in the number of iterations of the algorithm, 
or coexist on in the low temperature regime. The par¬ 
titions with the same number q of groups typically have 
high overlaps among them and very similar modulari¬ 
ties. We have proposed a normalized temperature scale 
(T/Tq) on which mod-bp has a very similar behavior for 
different values of q. These findings provide a broader 


description of the mod-bp algorithm behaviour, showing 
its robustness and effectiveness. Hopefully they can be 
very useful when mod-bp is run on real networks where 
the ground through is unknown. 

Real network may have hierarchical structures [55H3T] 
and the deeper understanding of the different recovery 
phases achieved in this work may help in using the tem¬ 
perature as a simple parameter to study by mod-bp dif¬ 
ferent levels of the hierarchical clustering. The different 
levels of clustering hierarchy may correspond to recovery 
phases with different values of q, obtained decreasing the 
temperature. 
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